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ABSTRACT 


The  concept  of  weighted  distributions  can  be  traced  to  the  study  of  the 
effects  of  methods  of  ascertainment  upon  the  estimation  of  frequencies  hy 
Fisher  in  1934,  and  it  was  formulated  in  general  terms  by  the  author  in  a 
paper  presented  at  the  First  International  Symposium  on  Classical  and 
Contagious  Distributions  held  in  Montreal  in  1963*  Since  then,  a  number  of 
papers  have  appeared  on  the  subject.  This  paper  reviews  some  previous  work, 
points  out,  through  appropriate  examples,  some  situations  where  weighted 
distributions  arise  and  discusses  the  associated  methods  of  statistical 
analysis. 

The  Importance  of  specification  of  the  class  of  underlying  probability 
distributions  (or  stochastic  model)  in  data  analysis  based  on  a  detailed 
knowledge  of  how  data  are  obtained  is  emphasized.  Failure  to  take  into 
account  the  conditions  of  ascertainment  of  data  can  lead  to  wrong  conclusions. 


Keywords  and  Phrases:  Damage  models,  nonresponse,  probability  sampling, 
quadrat  sampling,  size  biased  sampling,  truncation,  weighted  distributions. 
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1 .  IMPORTANCE  OF  SPECIFICATION 

For  drawing  valid  inferences  fron  observed  data  through  statistical 
methodology  it  is  necessary  to  identify  the  proper  sample  space  (all  possible 
outcomes)  and  specify  the  class  of  probability  distributions  (the  model)  to 
which  the  true  distribution  of  the  observations  belongs.  More  precisely,  the 
observed  data  set  x  has  to  be  considered  as  the  result  of  a  random  experiment, 
i.e. ,  as  a  realization  of  a  random  variable  (xe)  X  taking  values  in  a  space 
X  and  subject  to  a  probability  distribution  P  belonging  to  a  specified  class 
P.  Such  a  knowledge  enables  us  to  write  down  the  probabiity  (or  probability 
density)  of  x  for  given  P,  which  we  write  as  £(p|x).  The  function  H'\  x) 
defined  over  P  for  given  x,  called  the  likelihood,  together  with  any  aprlorl 
information  we  may  have  on  P  forms  the  basis  of  statistical  inference.  The 
specification  of  P,  or  the  model  as  it  is  sometimes  called,  is  thus  a  datum  of 
the  problem  of  inference.  However,  not  much  attention  is  given  to  this 
problem  in  statistical  theory  or  practice  despite  the  emphasis  given  to  it  by 
the  pioneers  in  statistics  like  Karl  Pearson  and  R.  A.  Fisher.  Wrong 
specification  may  lead  to  invalid  inference,  which  is  sometimes  referred  to  as 
a  third  kind  of  error,  the  first  two  being  the  familiar  ones  associated  with 
the  Neyman- Pearson  theory  of  testing  of  hypotheses. 

It  is  almost  axiomatic  to  say,  although  it  may  need  some  effort  to 
demonstrate,  that  inference  based  on  specification  P,,  is  possibly  more  precise 
than  that  on  P2  if  P^  c  P2  provided  P^  includes  the  true  distribution.  It  is 
therefore  of  considerable  value  to  specify  the  smallest  possible  set.  (See 
Altbim,  1984;  Bishop,  Flenberg  and  Holland,  1975,  p.  313)*  Perhaps  past 
experience  can  be  of  help  in  the  choice  of  such  a  set.  But  it  should  also  be 
possible  to  start  with  a  wider  set  and  narrcw  it  down  by  using  the  observed 
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data  themselves,  although  the  appropriate  methodology  for  this  purpose  is  not 
fully  developed.  On  the  other  hand,  statisticians  seem  to  be  content  with 
studies  on  robustness,  i.e.,  in  determining  the  widest  class  P  for  which  a 
given  statistical  procedure  is  valid. 

The  problem  of  specification  is  not  a  simple  one.  A  detailed  kncwledge  of 
the  procedure  actually  employed  in  acquiring  data  is  an  essential  ingredient 
in  arriving  at  a  proper  specification.  The  situation  is  more  complicated  with 
field  observations  and  nonexperimental  data  where  nature  produces  events 
according  to  a  certain  model,  which  are  observed  and  recorded  by 
investigators.  There  does  not  always  exist  a  suitable  sampling  frame 
necessary  for  the  application  of  the  classical  sampling  theory.  One  needs  to 
work  with  visibility  analysis  instead.  In  practice,  it  is  not  always  possible 
to  observe  and  record  events  as  they  occur.  For  instance,  certain  events  may 
not  be  observable  by  the  method  we  employ  and  therefore  missed  in  the  record 
(truncated,  censored,  and  incomplete  samples).  Or  an  event  may  be  observable 
only  with  a  certain  probability  depending  on  the  nature  of  the  event  such  as 
its  conspicuousness  and  the  procedure  employed  to  observe  it  (unequal 
probability  sampling).  Or  an  event  may  change  in  a  random  way  by  the  time  or 
during  the  process  of  observation  so  that  what  comes  on  record  is  a  modified 
event  (damage  models).  Sometimes,  events  produced  under  two  or  more  different 
mechanisms  with  unspecified  relative  frequencies  get  mixed  up  and  brought  into 
the  same  record  (outliers,  contaminated  samples).  In  all  these  cases,  the 
specified  class  P  for  the  original  events  (as  they  occur)  may  not  be 
appropriate  for  the  events  as  they  are  recorded  (observed  data)  unless  it  is 
suitably  modified. 


In  a  classical  paper,  Fisher  (1934)  demonstrated  the  need  for  such 
adjustment  in  specification  depending  on  the  way  the  data  are  ascertained.  In 
extending  the  basic  ideas  of  Fisher,  the  author  (Rao,  1965)  introduced  the 
concept  of  a  weighted  distribution  as  a  method  of  adjustment  applicable  to 
many  situations.  In  the  present  paper  we  discuss,  through  suitable  examples, 
some  procedures  for  making  adjustments  in  specification  based  on  methods  of 
ascertaining  data. 

Although  I  have  mentioned  only  field  observations  which  are  collected 
without  the  help  of  a  suitable  sampling  frame,  I  must  emphasize  that  similar 
problems  of  specification  arise  with  data  collected  in  large  scale  sample 
surveys  and  also  with  data  acquired  through  field  and  laboratory  experiments. 
Survey  practioners  are  faced  with  problems  of  incomplete  frame  which  raise 
questions  of  representativeness  of  a  sample  for  a  given  population  (see 
Kruskal  and  Mosteller,  1980  and  references  therein),  nonresponse  which  raises 
questions  of  repeated  visits  to  sampled  units,  substitution  of  nonresponding 
units  by  others  with  possibly  similar  characteristics,  and  imputation  of 
values  (Fienberg  and  Tanur,  1983;  Fienberg  and  Stasny,  1983;  Rubin,  1976, 
1980),  and  nonsampllng  errors  which  raise  questions  about  their  recognition, 
detection,  measurement  and  making  adjustments  in  expressing  precision  of 
estimates  (Mahalanobls  1944;  Mosteller,  1978).  Similarly  in  design  of 
experiments,  difficulties  in  random  allocation  of  treatments  and  choice  of 
controls  in  field  trials,  pooling  of  evidence  from  different  experiments 
conducted  over  space  and  time  and  missing  values  (drop  outs)  introduce 
additional  uncertainties  in  statistical  Inference  and  interpretation  of 
results  for  practical  use  or  policy  purposes  (for  typical  problems  and 
references  see  Fienberg,  Singer  and  Tanur,  1984;  Neyman,  1977). 


2.  TRUNCATION  AND  CENSORING 


Some  events,  although  they  occur,  may  be  unascer tainable  so  that  the 
observed  distribution  would  be  truncated  to  a  certain  region  of  the  sample 
space.  An  example  is  the  frequency  of  families  with  both  parents  heterozygous 
for  albinism  but  having  no  albino  children.  Ihere  is  no  evidence  that  the 
parents  are  heterozygous  unless  they  have  an  albino  child,  and  the  families 
with  such  parents  and  having  no  albino  children  get  confounded  with  normal 
families.  The  actual  frequency  of  the  event  'zero  albino  children'  is,  thus, 
not  ascertainable.  Adjustment  to  the  probability  distribution  applicable  to 
observable  events  in  such  a  case  is  simple. 

In  general,  if  p(x,9)  is  the  (probability  density  function),  where 

9  denotes  unknown  parameters,  and  the  X  is  truncated  to  a  specified  region 
T  c.  X ,  then  the  ndf  of  the  truncated  random  variable  i*  is 

pW(x,e)  a  w(X,  T)p(x,9)  -V  u(T,9)  (2.1) 

where  w(x,T)  a  1  if  x  e  T  and  a  0  if  x  f.  T  and  u(T,e)  a  E[w(X,T)].  If 
x^,...,xQ  are  Independent  observations  subject  to  truncation,  then  the 
likelihood  is 

p(x1f9)  ...  p(xn,9)  +  [u(T,e)]n.  (2.2) 

In  some  cases  we  may  have  Independent  observations  x^,...,xQ  arising  from  a 
truncated  distribution  in  addition  to  a  number  m  (and  not  the  actual  values) 
of  observations  falling  outside  T.  Then  the  likelihood  is 

*  P(x1te)  ...  p(xn,9)[1  -  u(T,e)]“.  (2.3) 

A  more  complicated  case  is  the  following. 
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Suppose  that  we  have  a  measuring  device  which  records  the  time  at  which  a 
bulb  fails.  If  we  are  experimenting  with  n  bulbs  in  a  life  testing  problem 
using  a  measuring  device  which  may  itself  fail  at  a  random  time,  then  the 
observations  would  be  of  the  type 


x1t...,x  ,  ng,  n. 


(2.4) 


HO 

*r  *'  3 

where  x1t...,xn^  are  the  life  times  of  n^  bulbs  recorded  before  an  unlcnown 
time  point  T  at  which  the  measuring  device  failed,  Is  the  number  of  bulbs 
that  failed  between  T  and  Tq,  the  known  time  at  which  the  experiment  was 
terminated,  and  n^  is  the  number  of  bulbs  still  burning  after  Tq.  Let 


w.,(T,e)  =  P(xiT),  w2(T,e)  =  P(T  <  xi  Tq)  ,  w3(T,9)  =  1-w,  (T,  e)-w2(T,  0) . 


Then  the  likelihood  based  on  the  data  (2.4)  is 

n2Tn3f  PCXi , 0)  ...  p(xni,e)[w2(T,9)]D2[w3(T,0)]n3  (2.5) 

where  T  is  unknown  besides  the  basic  parameters  6.  Inference  on  T  and  9  based 
on  (2.5)  does  not  seem  to  have  been  fully  worked  out  but  could  be  developed  on 
standard  lines. 


The  expressions  (2.2),  (2.3),  and  (2.5)  are  simple  examples  of  weighted 
distributions,  whose  general  definition  is  given  in  Section  3. 


3.  WEIGHTED  DISTRIBUTIONS 

In  Section  2,  we  have  considered  situations  where  certain  events  are 
unobservable.  But  a  more  general  case  is  where  an  event  that  occurs  has  a 
certain  probability  of  being  recorded  (or  included  in  the  sample).  Let  X  be  a 
rv  with  p(x,6)  as  the  ndf.  and  suppose  that  when  X  =  x  occurs,  the  probability 
of  recording  it  is  w(x,a)  depending  on  the  observed  value  x  and  possibly  also 


7 


on  an  unknown  parameter  ct.  Then  the  pdf  of  the  resulting  rv  Xw  is 

pw(x,e,a)  =  w(x,a)p(x, e)  E[w(X,a)].  (3.1) 

Although  in  deriving  (3.1)»  we  chose  w(x,a)  such  that  0  <L  w(x,a)  i  1,  we  can 
define  (3.1)  for  any  arbitrary  non-negative  weight  function  w(x,a)  for  which 
E[w(X,a)]  exists.  The  distribution  (3.1)  obtained  by  using  any  non-negative 
weight  function  w(x,ct)  is  called  (see  Rao,  1965)  a  weighted  version  of  p(x,e). 
In  particular,  the  weighted  distribution 

pw(x,  0)  =  |  x[p(x,e)  +  E[ |x | ]  (3.2) 

where  |  x  |  is  the  norm  or  some  measure  of  size  of  x  is  called  the  size  biased 
distribution.  When  x  is  univariate  and  non-negative,  the  weighted  distribution 

pw(x,  0)  a  x  p(x,6)  -f  E(X)  (3.3) 

is  called  length  (size)  biased  distribution.  For  example,  if  X  has  the 
logarithmic  series  distribution 

r 

-rlog( 1  -  a )  *  r  8  1»  2»  •••  (3. A) 

then  the  distribution  of  the  size  biased  variable  is 

ar“ 1 ( 1  -  a),  r  a  1,  2,  ...  (3.5) 

which  shows  that  Xw  -  1  has  a  geometric  distribution.  A  truncated  geometric 
distribution  is  sometimes  found  to  provide  a  good  fit  to  an  observed 
distribution  of  family  size  (Feller,  1966).  But,  if  the  information  on  family 
size  has  been  ascertained  from  school  children,  then  the  observations  would 
have  a  size  biased  distribution.  In  such  a  case  a  good  fit  of  the  geometric 
distribution  to  the  observed  family  sizes  would  indicate  that  the  underlying 
distribution  of  family  size  is,  in  fact,  a  logarithmic  series. 


Table  1  gives  a  list  of  some  basic  distributions  and  their  size  biased 
forms,  it  is  seen  that  the  size  biased  form  belongs  to  the  same  family  as  the 
original  distribution  in  all  cases  except  the  logarithmic  series  (see  Rao, 
1965;  Patil  and  Ord,  1975;  Janardhan  and  Rao,  1 983  for  characterizations  and 
examples  of  size  biased  distributions). 
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Table  1.  Certain  Basic  Distributions  and  their  Size-Biased  Forms 


Random  Variable(rv)  of ( pdf ^ 


Size-biased  rv 


Binomial ,B(n,  p) 

Negative  Binomial, 
NB(k,p) 

Poisson, Po(X) 


Px(1  -  P) 


n-x 


Ik  ♦  x  -  11 


Q*Pk 


e"XXx/xl 


1  *  B(n  -  1  ,p) 

1  ♦  NB(k  +  1,p) 
1  ♦  Po(X) 


Logarithmic  series, 

L(a)  {-log(1  -ct)}-1ax/x 


1  +  NB( 1 ,a) 


Hyper geometric, 
H(n,M,  N) 


n 

x 


Mx( N  -  M) 


n-x/Nn 


1  +  H(n  -  1  ,M  -  1,N  -  1) 


Blmonlal  beta, 
BB(n,a,y) 


Negative  binomial 
beta,  NBB(k,a , y) 


Gamma,  G(at,k) 

Beta  first  kind, 
B^  ( <S  ,y  ) 

Beta  second  kind, 
B2(  <S»y) 

Pearson  type  V, 
Pe(k) 

Pareto,  Pa(a,y) 


n\ 

8(  a  ♦  x, 

‘I 

^k  +  x  -  1 

l  X  « 


Y  +  n  -  x)/S(a,Y> 


8(a  +  x,  y+  k)/8(  a,y ) 


akxk-1e-ax/r(k) 
x<S"1(1  -  x)y“Vb(6  ,y ) 


1  +  BB(n  -  1 ,a, y) 

1  ♦  NBB(k  ♦  1,4,y> 
G(a,k  +  1) 

Bj( 6  +  1 ,y) 


x^-1 ( 1  ♦  x)_t/s(5 ,  y  -  6) 

x'^expC-x"1  )/r(k) 
yctx"^+1\x  1  a 


B2( 6  ♦  1 1 y  ”  5  -  1 ) 

Pe(k  -  1) 

Pa(  a,  y  “  1) 


Lognormal , 
LN(u,c2) 


(2”o2) 


■  i  „-1 


exp 


log  X  -  u 

0  /l 


2 


LN(p  ♦  a  2 ,  o2 ) 


An  example  of  weighted  distributions  arises  in  sample  surveys  when  unequal 
probability  sampling  or  dos  (probability  proportional  to  size)  sampling  is 
employed.  A  general  version  of  the  sampling  scheme  involves  two  j^'s  X  and  X 
with  pdf.  p(x,y,0)  and  a  weight  function  w(y)  which  is  a  function  of  y  only 
giving  the  weighted  pdf 

pw(x,y,  0)  =  w(y)p(xfy,0)  •*  E[w(X)].  (3.6) 

In  sample  surveys  we  obtain  observations  on  (Xw, Yw)  from  the  jui£  (3*6)  and 
draw  inference  on  the  unknown  parameter  0. 

It  is  of  interest  to  note  that  the  marginal  pdf  of  Xw  is 

pw(x,9)  =  w(x,0)p(x,0)  +■  E[w(X,9  ) ]  (3-7) 

which  is  a  weighted  version  of  p(x,6)  with  the  weight  function 

w(x, 6)  =  |p(y |x)w(y)dy  (3-8) 

which  may  involve  the  unknown  parameter  6. 

There  is  an  extensive  literature  on  weighted  distributions  since  the  concept 
was  formalized  in  Rao  (1965),  which  is  reviewed  with  a  large  number  of 
references  in  a  paper  by  Patll  (1984)  with  special  reference  to  ecological 
work.  Reference  may  also  be  made  to  two  earlier  contributions  by  Patil  and 
Rao  (1977,  1978)  and  Patil  and  Ord  (1975)  which  contain  reviews  of  previous 
work  and  details  of  seme  new  results. 

In  the  next  sections,  we  consider  several  examples  where  weighted 
distributions  are  used  in  the  analysis  of  data. 


4.  DIFFERENTIAL  PRESERVATION  OF  SKULLS 


The  following  problem  arose  in  the  analysis  of  cranial  measurements.  A 
sample  of  skulls  dug  out  from  ancient  graves  in  Jebel  Mova.  Africa,  consisted 
of  some  well-preserved  skulls  and  the  rest  in  a  broken  condition  (see 
Mukherjl,  Trevor  and  Rao,  1955).  On  each  well-preserved  skull  it  was  possible 
to  take  four  measurements,  C  (capacity),  L  (length),  B  (breadth),  and  H 
(height),  while  on  a  broken  skull  only  a  subset  of  L,  B,  and  H  and  not  C  could 
be  measured.  The  observed  data,  thus,  consisted  of  samples  from  a  four 
variate  population  with  several  observations  missing.  There  were  some  sets 
with  all  the  four  measurements  C,  L,  B,  H,  and  some  with  1  or  2  or  3  of  the 
measurements  L,  B,  and  H  only.  The  problem  was  to  estimate  the  mean  values  of 
C,  L,  B,  and  H  in  the  qi»i nal  population  of  skulls  from  the  recovered 
fragmentary  samples.  In  a  number  of  papers  which  appeared  in  the  early  Issues 
of  Blometrika.  it  was  the  practice  to  estimate  the  unknown  population  mean 
value  of  any  characteristic,  say  C,  by  taking  the  mean  of  all  the  available 
measurements  on  C.  An  alternative  to  this,  which  is  often  recommended,  is  to 
compute  maximum  likelihood  estimates  of  the  unknown  mean  values,  variances, 
and  covariances  by  writing  down  the  likelihood  function  based  on  all  the 
available  data  assuming  a  four  variate  normal  distribution  for  C,  L,  B  and  H 
and  using  the  derived  marginal  distribution  for  an  incomplete  set  of 
measurements.  This  is  based  on  the  assumption  that  each  skull  admitting  all 
the  four  measurements  or  any  subset  of  the  four  can  be  considered  as  a  random 
sample  from  the  original  population  of  skulls.  Is  this  assumption  valid? 

It  is  a  common  knowledge  that  a  certain  proportion  of  the  original  skulls 
gets  broken  depending  on  the  length  of  time  and  depth  at  which  they  lay 
burled.  Let  w(c)  be  the  probability  that  a  skull  of  capacity  c  is  not  broken 
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and  p(c,  8)  be  the  pdf  of  C  in  the  original  population.  Then  the  pdf  of  C 
measured  on  well-preserved  skulls  is 

w(c)p(c,e)  +  E[w(C)].  (4.1) 

If  w(c)  depends  on  c,  then  the  observed  measurements  on  C  cannot  be  considered 
as  a  random  sample  of  C  from  the  original  population.  Further,  if  w(c)  is  a 
decreasing  function  of  c,  then  there  will  be  a  larger  representation  of  small 
skulls'  among  the  unbroken  skulls,  and  therefore  the  mean  of  the  available 
measurements  on  C  will  be  an  underestimate  of  the  mean  capacity  of  the 
original  population. 

Is  there  aay  evidence  that  w(c)  depends  on  c?  To  answer  this  question,  the 
regression  of  C  on  L,  B,  and  H  (in  terms  of  logarithms)  was  estimated  from  the 
data  sets  where  all  the  four  measurements  were  available  and  used  to  predict 
the  mean  capacity  of  broken  skulls  by  substituting  the  observed  averages  L,  TS, 
and  7  of  broken  skulls  in  the  regression  equation.  At  least  in  two  series  of 
cranial  measurements,  (see  Rao  and  Shaw,  1948;  Rao,  1973,  p.  280)  it  was 
found  that  the  average  measured  capacity  of  unbroken  skulls  was  smaller  than 
the  estimated  average  capacity  of  broken  skulls.  This  provided  some  evidence 
about  the  differential  preservation  of  skulls  with  smaller  skulls  having  a 
higher  chance  of  remaining  unbroken. 

This  finding  invalidates  the  assumption  that  skulls  providing  all  the  four 
measurements  is  a  random  sample  from  the  original  population  of  skulls.  The 
pdf  associated  with  these  measurements  is  more  appropriately  (4.1)  which  is  a 
weighted  version  of  the  original  £&  with  an  unknown  weight  function. 
Presunably,  the  pdf  associated  with  observations  on  any  subset  of  L,  B,  and  H 
will  again  be  a  weighted  pdf  with  a  weight  function  depending  on  the  degree  of 
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damage  to  a  skull.  The  expression  for  the  correct  likelihood  will  then  depend 
on  the  original  Pdf  and  the  probabilities  of  different  degrees  of  damage  as 
assessed  by  subsets  of  measurements  that  can  be  taken  on  a  skull,  which  are 
likely  to  be  unknown.  Is  there  a  reasonable  solution  to  the  problem  of 
estimation  of  mean  values  in  a  situation  like  the  above? 

There  are  several  possibilities  of  which  the  following  procedure  for 
estimating  the  mean  of  C  appears  to  be  a  natural  one.  We  use  the  complete 
sets  of  measurements,  C,  L,  B  and  H,  on  unbroken  skulls  to  compute  the 
regressions  of  C  on  different  subsets  of  L,  B  and  H.  Using  the  appropriate 
regression  function,  we  estimate  (predict)  the  missing  value  of  C  for  each 
broken  skull.  Then  an  average  is  taken  of  all  the  measured  and  estimated 
values  of  C.  Such  an  average  is  likely  to  be  a  valid  estimate  of  the  mean  of 
C.  The  estimation  is  based  on  the  assumption  that  the  complete  sets  of 
measurements  (C,  L,  B,  H)  can  provide  valid  estimates  of  relationships  like 
the  regression  functions  of  C  on  L,  B,  H  and  its  subsets,  although  they  are 
biased  samples  from  the  original  population.  Similar  methods  can  be  used  to 
estimate  the  mean  values  of  L,  B  and  H. 

Paleontologists  compare  the  characteristics  of  fossils  of  long  bones  and 
cranial  material  discovered  in  different  parts  of  the  world  to  trace  the 
evolutionary  history  of  hcmlnlds.  Such  studies  based  on  physical  measurements 
may  be  misleading  as  the  discovered  fossils  may  not  be  representative  samples 
from  the  original  populations  due  to  differential  preservation  of  skeletal 
material.  It  is  gratifying  to  note  that  attempts  are  being  made  to  compare 
the  fossils  in  terms  of  some  basic  chemical  measurements  which  are  not  likely 
to  be  subject  to  the  phenomenon  of  differential  preservation. 
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5.  ENQUIRY  THROUGH  AN  OFFSPRING 


In  genetic  and  sod o- psychological  studies  It  Is  the  common  practice  to 
locate  an  abnormal  Individual  and  through  him  or  her  collect  information  on 
the  status  of  brothers  and  sisters,  parents,  uncles,  and  aunts.  From  such 
data  estimates  are  made  of  the  lnddence  of  abnormality  in  families  by  sex  and 
parity  of  birth.  A  family  is  the  basic  unit  whose  characteristics  may  have  a 
specified  distribution.  But  our  method  of  ascertaiment  gives  unequal 
probabilities  to  families  depending  on  the  mechanism  Inherent  in  the  selection 
of  an  abnormal  family  member.  Thus,  the  distribution  applicable  to  observed 
data  on  families  Is  a  weighted  version  of  the  distribution  specified  for  the 
families.  He  consider  some  examples,  discuss  the  nature  of  the  problems 
involved  In  each  case,  and  suggest  possible  solutions. 


5.1  TOO  MANY  MALES? 


During  the  last  few  years,  while  lecturing  to  students  and  teachers  in 
different  parts  of  the  world,  I  collected  data  on  the  n ushers  of  brothers  and 
sisters  In  the  family  of  each  Individual  in  the  audience.  The  results  are 
summarized  in  Tables  2,  3,  and  4.  The  data  from  the  male  respondents  given  In 
Tables  2  and  4  shew  that  the  ratio  of  B,  the  total  nunber  of  brothers, 
including  the  respondents,  to  B  ♦  S,  the  total  number  of  brothers  and  sisters 
Is  much  larger  than  half  in  each  case  indicating  a  preponderance  of  male 
children  in  the  families  of  male  members  of  the  audience. 


*-*  ■-> 


Rao  (1977)  showed  that  the  appropriate  model  for  the  distribution  of 
brothers  and  sisters  of  male  respondents  is  size  biased  binomial  so  that  the 


probability  of  r  brothers  and  (n  -  r)  sisters  in  a  family  of  size  n  is 


iTr(  1  -  ir)n"r  -f  E(r)  = 


n  -  1 


_r-1 


r  -  1 


( 1  -  it) 


n-r 


(5.1.1) 


where  it  is  the  probability  of  a  male  child.  Under  this  hypothesis  we  find 
that 


B  -  k 

B"T  ‘S'  -  E 


S  V 


where  k  is  the  number  of  male  respondents,  so  that  (B  -  k)/(B 
estimate  of  ir,  and 


(5.1.2) 
S  -  k)  is  an 


[B  -  k  -  (B  +  S  -  k)ir]2 

(B  +  S  -  k)ir(  1  -  it)  (5.1.3) 

has  an  asymptotic  chi-square  distribution  on  1  degree  of  freedom.  Similar 
results  hold  for  the  data  from  female  respondents  in  Table  3.  It  is  seen  from 
the  chi-square  values  in  Tables  2  and  3  that  the  data  collected  from  the 
students  are  consistent  with  the  hypothesis  of  size  biased  binomial  with  ir  = 
1/2. 

The  situation  is  somewhat  different  in  Table  4  relating  to  data  from  the 
professors.  The  estimated  ir  is  more  than  half  in  each  case  and  the  chi-square 
values  are  high.  This  implies  that  the  weight  function  appropriate  for  these 
data  is  of  a  higher  order  than  r,  the  number  of  brothers.  A  possible 
sociological  explanation  for  this  is  that  a  person  coming  from  a  family  with  a 
larger  number  of  brothers  tends  to  acquire  better  academic  qualifications  to 


compete  for  Jobs  I 


The  following  example  on  observed  sex  ratio  is  illuminating.  In  a  survey  of 
fertility  and  mortality,  Dandekar  and  Dandekar  (1953)  gave  the  distribution  of 
brothers  (excluding  the  Informant)  and  sisters,  and  sons  and  daughters  as 
reported  by  1115  'male  heads,'  contacted  through  households  chosen  with  equal 
probability  for  each  household.  It  may  be  observed  that  in  a  survey  of  this 
type,  a  family  with  r  brothers  gets  a  chance  nearly  proportional  to  r,  and  the 
conditions  for  a  weighted  binomial  with  w(r)  =  r  holds  for  the  nunber  of 
brothers  in  a  family.  Yet  we  find  from  Table  5  that  the  total  number  of 
brothers  1325  (excluding  the  informants)  is  far  in  excess  of  the  number  of 
sisters,  1014  giving 

2  (1325  -  1014)2 

^  s  T32TVT61V  =  41.35 

which  is  very  high  on  1  degree  of  freedom.  Is  the  theory  of  size  biased 
binomial  wrong? 

But  it  is  clear  from  Table  5  that  the  disproportionate  sex  ratio  is  confined 
to  the  age  groups  above  15*19  years  and  the  same  phenomenon  seems  to  occur  in 
the  case  of  sons  and  daughters.  There  is  perhaps  an  underreporting  of  sisters 
and  daughters  who  are  married  off  due  to  a  superstitious  custom  of  not 
including  them  as  members  of  the  household.  Underreporting  of  female  members 
is  a  persistent  feature  of  data  on  fertility  and  mortality  collected  in 
developing  countries. 
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Table  2.  Data  on  male  respondents  (students) 

(k  =  number  of  students,  B  =  total  number  of  brothers  including  the 
respondent,  S  =  total  number  of  sisters). 


Place  and  year 

k 

B 

S 

*?S 

B-k  * 

S&E 

2 

X 

Bangalore  (India,  75) 

55 

180 

127 

.586 

.496 

.02 

Delhi  (India,  75) 

29 

92 

66 

.582 

.490 

.07 

Calcutta  (India,  63) 

104 

414 

312 

.570 

.498 

.04 

Waltair  (India,  69) 

39 

123 

88 

.583 

.491 

.09 

Ahmedabad  (India,  75) 

29 

84 

49 

.632 

.523 

.35 

Tirupati  (India,  75) 

592 

1902 

1274 

.599 

.484 

.50 

Poona  (India,  75) 

47 

125 

65 

.658 

.545 

1.18 

Hyderabad  (India,  74) 

25 

72 

53 

.576 

.470 

.36 

Tehran  (Iran,  75) 

21 

65 

40 

.619 

.500 

.19 

Isphahan  (Iran,  75) 

11 

45 

32 

.584 

.515 

.06 

Tokyo  (Japan,  75) 

50 

90 

34 

.725 

.540 

.49 

Lima  (Peru,  82) 

38 

132 

87 

.603 

.519 

.27 

Shanghai  (China,  82) 

74 

193 

132 

.594 

.474 

.67 

Columbus  (USA,  75) 

29 

65 

52 

.556 

.409 

2.91 

College  St.  (USA,  76) 

63 

152 

90 

.628 

.497 

.01 

Total 

1206 

3734 

2501 

.600 

.503 

0.14 
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Table  3.  Data  on  female  respondents  (students) 


Place  and  year 

k 

B 

S 

,S~k 

2 

X 

B+S-k 

Lima  (Peru,  82) 

16 

37 

48 

.565 

.464 

.36 

Los  Banos  (Philippines,  83) 

44 

101 

139 

.579 

.485 

.18 

Manila  (Philippines,  83) 

84 

197 

281 

.588 

.500 

O 

O 

e 

Bilbao  (Spain,  83) 

14 

19 

35 

.576 

.525 

.10 

Total 

158 

354 

503 

.587 

.493 

.11 

Table  4. 

Data  on 

male  respondents  (professors) 

Place  and  year 

k 

B 

S 

B?S 

B-k 

X2 

fi+5-lc 

State  College  (USA,  75) 

28 

80 

37 

.690 

.584 

2.53 

Warsaw  (Poland,  75) 

18 

41 

21 

.660 

.525 

2.52 

Poznan  (Poland,  75) 

24 

50 

17 

.746 

.567 

1.88 

Pittsburgh  (USA,  81) 

69 

169 

77 

.687 

.565 

2.99 

Tirupatl  (India,  76) 

50 

172 

132 

.566 

.480 

.39 

Maracaibo  (Venezuela,  82) 

24 

95 

56 

.629 

.559 

1.77 

Richmond  (USA,  81) 

26 

57 

29 

.663 

.517 

.03 

Total 

239 

664 

369 

.642 

.535 

3.95 

children  has  r«  males  of  whan  t.  are  albinos  and  r~  females  of  whom  t~  are 
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albinos  is 


P^.  t,;  r2,  t2)  = 


n 

[i 

n 

[r< 

r1 

$ 

ti  r.-t. 
^ 1  ^ 1  1 


tj  T j“  to 

t2^  <t>2Z  Z 


(5.2.1) 


or  a  female  is  taken  as  half. 


There  are  a  number  of  ways  in  which  we  can  introduce  probabilities  of 
selection  of  affected  families.  We  consider  some  models  which  are  extensions 
of  those  suggested  by  Fisher  (1934)  and  Haldane  (1938). 

Introducing  a  and  6  s  1  -  ct  as  relative  probabilities  of  observing  a  male 
or  a  female  albino,  we  may  consider  a  mixture  of  two  size  biased 
distributions. 


Pw(r1f  t1 ;  r2,  t^)  = 


20^ 

mr7 

\ 


26tc 


mr« 


p(rv  t1 ;  r2,  t2)  (5.2.2) 


as  the  appropriate  distribution  of  the  observed  vector  (r^,  tj,  r2,  t2).  If  we 
have  data  on  (r^,  tj,  r2>  t2)  from  H  ascertained  families,  we  can  write  down 
the  likelihood  using  the  expression  (5.2.2)  and  estimate  the  parameters  a,  ir^ 
and  ir2.  Alternatively,  we  can  use  the  method  of  moments,  using  the  statistics 
Etp  Zt2,  and  Ir^  to  estimate  the  unknown  parameters. 


If  ir,j  =  ir g  s  tt,  the  expression  (5.2.2)  reduces  to 


•^(ctt1  +  gt2)p(r1,  t,j  r2,  t2) 


(5-2.3) 


and  the  estimates  of  a  and  *  can  be  obtained  from  the  equations 


(5.2.M) 


t2  =  b  +  i<ni  -  d 

where  k  is  the  number  of  families,  is  the  nunber  of  children  in  the  i-th 
family  and  and  T2  are  average  nuabers  of  male  and  female  albino  children  in 
a  family. 

Another  model  is  as  follows.  Let  and  p2  be  the  probabilities  of 
observing  a  male  and  a  female  albino  respectively.  Then  the  probability  that 
a  family  with  n  children  having  t^  male  albinos  and  r^  -  t^  normal  males,  and 
t2  female  albinos  and  r2  -  t2  normal  females,  is  investigated  s1  times  by 
observing  a  male  albino  and  s2  times  by  observing  a  female  albino  is 


M 


'1 


(1  -  p,) 


t<— s 


1"a1 


1 2 
*2 


P^(1  -  Pg)*2  S2P(r1»  »  r2»  <5.2.5) 


Since  a  family  is  not  investigated  unless  at  least  one  of  t^  and  t2  is 
different  from  zero,  the  effective  distribution  for  the  observed  data  is 
(5.2.5)  normalized  by  the  dividing  factor  [1  -  (1  -  p  )n]  where 

p  =  ♦  p  2tt2)/2.  The  method  of  estimation  of  pj,  p2,  and  ir2  when  we 

have  the  additional  information  on  the  number  of  times  each  family  is 
Investigated  is  discussed  in  detail  in  Rao  (1965). 


In  case  a  family  is  investigated  only  once  although  more  than  one  abnormal 
child  in  the  family  is  observed  the  appropriate  distribution  is 


[1  -  (1  -  p1)tl(1  -p2)t23p(r1f  tyi  r2,  t2)  +  [1  -  (1  -  0)n]  (5.2.6) 
where  0  *  (ir^  +  ’T2P2^2,  Pi  *  p2  =  p  and  ^  s  it,  then  the 


rl'vi 


v_v, 

W 


22 


7Tjr,yw>  -7-  •»  ^  v  V*  '.^  V  '.»v 


expression  (5.2.6)  reduces  to 


[1  -  (1  -  p)*1  t2]  _ n! _ /%\t1+t2 

1  -  (1  -  wp)n  ti !  ( ri  -  t1 )  -  t2)  i  (ij 

If  sex  is  ignored  then  (5.2.7)  becomes 


(5.2.7) 


1  -  (1  -  P)' 


n! 


.t^n-t 


1  -  (1  -  ,p)»  kiln  -  HI  *  * 

where  t  s  t«  +  t2,  which  is  the  expression  used  by  Haldane  (1938). 


(5.2.8) 


We  have  considered  three  different  models  (5.2.2),  (5.2.5)  and  (5.2.6)  for 
the  probability  of  selection  of  a  family.  In  the  case  where  we  have 
information  only  on  the  number  r  of  abnormal  children  in  a  family  of  size  n 
without  any  sex  distinction  we  may  consider  a  weighted  binomial  distribution 


w(r) 


*r*®"r+  E[w(r)] 


with  three  possible  alternatives  for  w(r) 


(5.2.9) 


w(r)  =  r  (5.2.10) 

=  r®,  (a  unknown)  (5.2.11) 

=  1  -  (1  -  p)n,  (p  unknown).  (5.2.12) 

The  maximum  likelihood  method  of  estimating  a  and  it  under  the  model  (5.2.9, 

5.2.11)  is  discussed  in  Rao  (1965),  and  of  P  and  ^  under  the  model  (5.2.9, 

5.2.12)  in  Haldane  (1938).  To  demonstrate  the  relevance  of  the  weight 
function  (5.2.11),  we  compare  in  Table  6  the  observed  data  on  frequencies  of 
albino  children  in  families  of  different  sizes  with  the  expected  values  under 
the  two  different  weight  functions  w(r)  =  r  and  w(r)  s  r^2  choosing  tt  a  1/4. 
It  is  seen  that  the  weight  function  w(r)  s  r1/>2  provides  a  better  fit. 


Table  6.  Observed  and  expected  frequencies  of  albino  children  for 

each  family  size  (n) 

(expected  (1)  for  wr  =  r  and  expected  (2)  for  wr  =  r1/<2) 


<*_ . 


C.V 

'■V.- 

v’.V 


kv 


a  -  2  n  =  3  n  =  4  n  =  5 

albinos  ~ expected  obs-  expected  obs-  expected  obs-  expected 

erved  erved  erved  erved 


(1)  (2) 

(1)  (2) 

(1) 

(2) 

(1) 

(2) 

1 

31 

30.00  32.37 

37 

30.93  35.81 

22 

21 .10 

26.07 

25 

19.00 

24.93 

2 

9 

10.00  7.63 

15 

20.63  16.88 

21 

21.09 

18.43 

23 

25.31 

23-50 

3 

3 

3.44  2.30 

7 

7.03 

5.02 

10 

12.65 

9.59 

4 

0 

0.78 

0.46 

1 

2.81 

1.85 

5 

1 

0.23 

0.13 

no.  of 
albinos 


obs- 

erved 


n  *  6 
expected 
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For  a  general  discussion  of  the  type  of  problems  discussed  in  this  section, 
and  a  few  other  models  for  selection  probabilities,  the  reader  is  referred  to 
Stene  (1981)  and  other  references  mentioned  in  that  paper.  For  estimation  of 
a  and  it  in  the  model  (5.2.9*  5.2.11),  reference  may  be  made  to  Rao  (1965). 

5.3  ALCOHOLISM,  FAMILY  SIZE  AHD  BIRTH  ORDER 

Snart  (1963,  1969)  and  Sprott  (1969)  examined  a  number  of  hypotheses  on  the 
incidence  of  alcoholism  in  Canadian  families  using  the  data  on  family  size  and 
birth  order  of  242  alcoholics  admitted  to  three  alcoholism  clinics  in  Ontario. 
The  method  of  sampling  is  thus  of  the  type  discussed  in  Sections  5.1  and  5.2. 

One  of  the  hypotheses  tested  was  that  "larger  families  contain  larger  nunber 
of  alcoholics  than  expected."  The  null  hypothesis  was  interpreted  to  imply 
that  the  observations  on  family  size  as  ascertained  arise  from  the  weighted 
distribution 

np(n)  -*■  E(n) ,  n  =  1,  2,  ...  (5.3.1) 

where  p(n),  ns  1,  2,  ...  is  the  distribution  of  family  size  in  the  general 
population.  Including  families  with  no  alcoholics.  It  may  be  noted  that  the 
distribution  (5.3.1)  would  be  applicable  if  we  had  observed  an  individual 
(alcoholic  or  not)  at  random  from  the  general  population  and  ascertained  the 
size  of  the  family  to  which  he  or  she  belonged.  It  needs  some  argunent  to 
show  that  the  same  distribution  holds  for  family  size  ascertained  by  observing 
the  alcoholic  individuals  only.  The  following  Justification  of  (5.3.1)  makes 
use  of  an  interpretation  of  the  null  hypothesis  that  is  being  tested. 


Let  t  be  the  probability  of  an  Individual  becoming  an  alcoholic  and  suppose 
that  the  probability  that  a  member  of  a  family  becomes  an  alcoholic  is 
independent  of  whether  another  member  is  alcoholic  or  not.  Further  let  p(n), 
n  =  1,  2,...,  be  the  probability  distribution  of  family  size  (whether  a  family 
has  an  alcoholic  or  not)  in  the  general  population.  Then  the  probability  that 
a  family  is  of  size  n  and  has  r  alcoholics  is 


p(n)  it  t  r  s  0,...,n;  n  :  1,  2 . 


(5.3.2) 


From  (5<3«2),  it  follows  that  the  distribution  of  family  size  in  the  general 
population  given  that  a  family  has  at  least  one  alcoholic  is 


(1  -  $n)p(n)  -fr  t  -  E(«j,n),  n  x  1,  2,  ...  .  (5.3.3) 

If  we  had  chosen  households  at  randcm  and  recorded  the  family  sizes  in 
households  containing  at  least  one  alcoholic,  then  the  null  hypothesis  on  the 
excess  of  alcoholics  in  larger  families  could  be  tested  by  comparing  the 
observed  frequencies  with  the  expected  frequencies  under  the  model  (5.3.3). 
However,  under  the  sampling  scheme  adopted,  the  weighted  distribution  of  (n,r) 


is  more  appropriate.  If  we  bad  information  both  on  the  family  size  (n)  as 
well  as  on  the  number  of  alcoholics  (r)  in  the  family,  we  could  have  compared 
the  observed  joint  frequencies  of  (n,r)  with  those  expected  under  the  model 
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From  (5.3 >4),  the  marginal  distribution  of  n  alone  is 

np(n)  -t-  E(n) ,  n  =  1,  2,  ...  (5.3.5) 

which  is  used  by  Stuart  and  Sprott  as  a  model  for  the  observed  frequencies  of 
family  sizes.  It  is  shewn  in  (5.3.3)  that  in  the  general  population,  the 
distribution  of  family  size  in  families  with  at  least  one  alcoholic  is 


(1  -  *n)p(n)  -i-  1  -  E($n) 

which  reduces  to  (5.3.5)  if  is  close  to  unity.  Or  in  other  words,  if  the 
probability  of  an  individual  becoming  an  alcoholic  is  small,  then  the 
distribution  of  family  size  as  ascertained  is  dose  to  the  distribution  of 
family  size  in  families  with  at  least  one  alcoholic  in  the  general  population. 
This  is  not  true  if  <j>  is  not  dose  to  unity. 

Staart  and  Sprott  found  that  the  distribution  (5.3.5)  did  not  fit  the 
observed  frequencies,  which  had  heavier  tails  supporting  the  hypothesis  under 
test. 

An  alternative  to  (5.3.4)  is  obtained  by  assunlng  that  each  alcoholic  has  a 
chance  0  of  being  admitted  to  a  clinic  independently  of  other  alcoholic  family 
members.  In  such  a  case,  the  probability  that  a  family  of  size  n  has  r 
alcoholics  and  a  member  has  been  admitted  to  a  dinic  is 


P(n) 


irV^I  -  (1  -  e)r). 


(5.3.6) 


Ihe  marginal  distribution  of  n  with  the  normalizing  factor  is  then 


i  .  •_  •  «.  •,  *.  %  *#  *.  *.  *•  %  “*  ,%  N  **  ,N  v**  •»'*  ,V  A  »V***  A  A> 


p(n)(  1  -  (1  -ire)n)  +  E(  1  -  (1  -  ire)n)  (5.3.7) 

fi  s  2 ,  ...  . 

The  distribution  (5.3.7)  involves  one  unknown  parameter  ire  which  needs  to  be 
estimated  in  fitting  to  the  observed  frequencies  of  family  sizes.  Some 
examples  of  distributions  of  the  type  (5.3.7)  have  been  considered  by  Barrai, 
Mi,  Morton  and  Yasuda  (1965).  The  distribution  (5.3.7)  is  close  to  (5.3.5)  if 
ire  is  small. 


We  may  also  consider  a  more  complicated  model  by  assuming  different 
probabilities  and  *2  for  males  and  females  becoming  alcoholic  and  also 
different  probabilities  9,  and  02  for  male  and  female  alcoholics  being 
referred  to  a  clinic.  In  such  a  case,  the  probability  of  Inclusion  of  a 
family  of  size  n  with  r,  males  and  s,  male  alcoholics,  r2  females  and  s2 
female  alcoholics  is 


p(n) 


n 

J 

n 

fri 

r 

*1 

[a\ 

V 

1 

s.  r.-s. 
", 1 ♦, 1  1 


irg2*22”S20  -  (1  -  0, ) **’ ( 1  -  e2)S’2)  (5.3.8) 


which  gives  the  marginal  distribution  of  n  as 


P(n)(1  -  2“n(2  -ir,0,  -*2e2)n)  -S-  E(  1  -  2'n(2  -  it  ,  0,  -  Tt2e2)n)  (5.3.9) 

which  again  Involves  one  unknown  parameter  d,  9,  +  ”2  £^)/2.  The  marginal 
distribution  of  r,  and  r2  obtained  from  (5.3.8)  is 

a(1  -  (1  -  ^191)ri(1  -  7T202)r2)  +  E(1  -  2“n(2  -  71,9,  -  Trjjeg)”) 

(5.3.10) 


p(n) 


n^ 

\ri 


i' 


where  ns  (r.  +  r2).  If  *.9.  and  tt292  are  small,  then  (5.3.10)  becomes 
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p(n) 


1" 

1 

w 

2 

i 

l(r1^1  81  +  r2rr262)  + 


TT’j  0«j  +  TTg  0£ 


E(n). 


(5.3.11) 


If  we  had  the  Joint  frequencies  of  males  and  females  in  the  observed  families 
of  alcoholics,  we  could  have  fitted  distributions  of  the  type  (5.3.10)  and 
(5.3*11)  to  test  the  null  hypothesis  of  larger  number  of  alcoholics  in  larger 
families. 


It  is  seen  from  (5.3.10)  and  (5.3*11)  that  the  distribution  of  (r^,  r2)  will 
not  be  symmetric  unless  ^  6^  =  tt2 ©2 •  This  may  result  in  excess  of  males  or 
females  in  observed  families.  Such  an  effect  (with  an  excess  of  males)  can  be 
seen  in  similar  data  studied  by  Freire-Mala  and  Chakraborty  (1975)  and  Rao, 
Mazumdar,  Waller  and  Li  (1973);  these  authors  have  not,  however,  commented  on 
it. 


Another  hypothesis  considered  by  Smart  was  that  the  later  born  children  have 
a  greater  tendency  to  become  alcoholic  than  the  earlier  born.  The  method  used 
by  Smart  may  be  somewhat  confusing  to  statisticians.  Some  comments  were  made 
by  Sprott  criticizing  Snart's  approach.  We  shall  review  Smart's  analysis  in 
the  light  of  the  model  (5.3*1*).  If  we  assume  that  birth  order  has  no 
relationship  on  becoming  an  alcoholic,  and  the  probability  of  an  alcoholic 
being  referred  to  a  clinic  is  independent  of  the  birth  order,  then  the 
probability  that  an  observed  alcoholic  belongs  to  a  family  with  n  children,  r 
alcoholics  and  has  given  birth  order  s  1  n  is,  using  the  model  (S^.1*), 


8 


n  P(n) 


ir^n-r  j. 


\  I 


1,...,n;  r  ■  1,»«.,n;  n 


E(n) 

1,  2,  ...  . 


(5.3.12) 
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Summing  up  over  r,  the  marginal  distribution  of  (n,s),  the  family  size  and 
birth  order,  applicable  to  the  observed  distribution,  is 


P(n)  -t-  E(n)  (5.3.13) 

s  —  1 . .  n  —  1,  2 ,  ... 

where  it  may  be  recalled  that  p(n),  ns  1,  2,...,  is  the  distribution  of 
family  size  in  the  general  population.  Smart  gave  the  observed  bivariate 
frequencies  of  (n,s),  and  since  p(n)  was  known,  the  expected  values  could  have 
been  computed  and  compared  with  the  observed.  He  did  something  else. 

From  (5.3.13)>  the  marginal  distribution  of  birth  rank  is 

00 

t  p(i)  +  E(n),  r  *  1,  2 .  (5.3. 14) 

Ur 

Smart's  (1963)  analysis  in  his  Table  2  is  an  attempt  to  compare  the  observed 
distribution  of  birth  ranks  with  the  expected  under  the  model  (5.3*14)  with 
p(i)  itself  estimated  from  data  using  the  model  (5.3.1). 

A  better  method  is  as  follows:  from  (5.3.13)  it  is  seen  that  for  given 
family  size,  the  expected  birth  order  frequencies  are  equal  as  computed  by 
Snart  (1963)  in  Table  1,  in  which  case  Individual  chi  squares  comparing  the 
expected  and  observed  frequencies  for  each  family  size  would  provide  all  the 
information  about  the  hypothesis  under  test.  Such  a  procedure  would  be 
Independent  of  any  knowledge  of  p(n).  But  it  is  not  clear  whether  a 
hypothesis  of  the  type  posed  by  Snart  can  be  tested  on  the  basis  of  the 
available  data  without  further  information  on  the  other  alcoholics  in  the 
family,  such  as  their  ages,  sexes,  etc. 


Let  us  consider  a  portion  of  Table  1  in  Sknart  (1963)  relating  to  families  up 
to  size  4  and  birth  ranks  up  to  4. 


Table  7. 

Distribution  of  birth  rank(s) 

Smart  (1963,  Table  1) 
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13 
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It  is  seen  that  for  family  sizes  2  and  3,  the  observed  frequencies  seem  to 
contradict  the  hypothesis,  and  for  family  sizes  above  3  (see  Smart's  Table  1), 
birth  rank  does  not  have  any  effect.  It  is  interesting  to  compare  the  above 
data  with  a  similar  type  of  data  collected  by  the  author  on  birth  rank  and 
family  size  of  the  staff  members  in  two  departments  at  the  University  of 
Pittsburgh. 

Table  8.  Distribution  of  birth  rank  and  family  size  (up  to  4) 

among  staff  members 


birth 

rank 


1 


family  size 
2  3 


4 
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It  appears  that  there  are  too  many  earlier  boros  among  the  staff  members 
indicating  that  becoming  a  professor  is  an  affliction  of  the  earlier  born!  It 
is  clear  that  the  observed  data  by  themselves  do  not  enable  any  inference  to 
be  drawn  on  the  relationship  between  birth  rank  of  a  child  and  any  attribute 
under  consideration. 


6.  QUADRAT  SAMPLI8Q  WITH  VISIBILITY  BIAS 

For  estimating  wildlife  population  density,  quadrat  sampling  has  been  found 
generally  preferable.  Quadrat  sampling  is  carried  out  by  first  seleotlng  at 
random  a  run  be  r  of  quadrats  of  fixed  size  from  the  region  under  study  and 
ascertaining  the  maber  of  animals  in  each.  The  following  assumptions  are 
made: 


-  :  Animals  are  found  in  groups  within  each  quadrat  and  the  number 
of  animals  X  in  a  group  follows  a  specified  distribution. 

-  A2:  The  maber  of  groups  N  within  a  quadrat  has  a  specified 

distribution. 


-  Ajt  The  number  of  groups  within  a  quadrat  and  the  maber  a  of 
animals  within  groups  are  Independent. 

Let  the  method  of  sampling  be  suoh  that  the  probability  of  sighting  (or 
recording)  a  group  of  x  animals  is  w(x).  If  X*  and  Nv  represent  the  jcx'a  of 
the  maber  of  animals  in  a  group  and  number  of  groups  within  a  quadrat  as 
ascertained,  then  we  have  the  following  results. 


(i)  P(N  =  m|N  =  n)  =  w“(1  -  w)"~“ 

where 


v  =  l  w(x)P(X  =  x) 

1 

is  the  visibility  factor  (or  the  probability  of  recording  a  group). 


(ii)  P(NW  a  m)  a  I 
nsi& 


w“(1  -  »)MP(1I  =  n) 


(ill)  P(NW  —  m,  x!(  a  x^|...|Xg  a  Xj) 

a  w‘“p(NW  a  m)  “  w(xA)P(X  =  X<) 
ial  *  * 

(iv)  Let  Sw  a  X^  +...+XJJ.  then 

00 


P(SW  a  y)  a  l  POP  a  m)P(SW  a  yjm) 

B=1 

P(SW  a  y|m)  a  l  .  .  .  — ^®-P(X^  a  x,)  .  .  .  P(X  a  Xm 

ZXj^ay 

The  formulae  listed  above  are  useful  in  many  practical  situations, 
the  sighting  probability  is  of  the  form 


v(x)  a  1  -  (1  -  g)x. 


(6.1) 

(6.2) 

(6.3) 

(6.4) 

(6.5) 

.  (6.6) 
Usually 

(6.7) 


For  seme  applications,  the  reader  is  referred  to  papers  by  Cook  and  Martin 
(1974),  and  Patil  and  Rao  (1977,  1978). 


7.  WAITING  TIME  PARADOX 


Patll  (1984)  reported  a  study  conducted  in  1966  by  the  "Institut  National  de 
la  Statistique  et  de  l'Econcmle  Appliquee"  in  Morocco  to  estimate  the  mean 
sojourn  time  of  tourists.  TVo  types  of  surveys  were  conducted  one  by 
contacting  tourists  residing  in  hotels  and  another  by  contacting  tourists  at 
frontier  stations  while  leaving  the  country.  The  mean  sojourn  time  as 
reported  by  3*000  tourists  in  hotels  was  17.8  days  and  by  12,321  tourists  at 
frontier  stations  was  9*0.  Suspected  by  the  officials  in  the  department  of 
planning,  the  estimate  from  the  hotels  was  discarded. 

It  is  clear  that  the  observations  collected  from  tourists  while  leaving  the 
country  correspond  to  the  true  distribution  of  sojourn  time,  so  that  the 
observed  average  9.0  is  a  valid  estimate  of  the  mean  sojourn  time.  It  can  be 
shown  that  in  a  steady  state  of  flow  of  tourists,  the  sojourn  time  as  reported 
by  those  contacted  at  hotels  has  a  size  biased  distribution  so  that  the 
observed  average  will  be  an  overestimate  of  the  mean  sojourn  time.  If  Xw  is  a 
size  biased  random  variable,  then 

E(XW)_1  *  (7.1) 

where  u  is  the  expected  value  of  X,  the  original  variable.  The  formula  (7.1) 
shows  that  the  harmonic  mean  of  the  size  biased  observations  is  a  valid 
estimate  of  u.  Thus  the  harmonic  mean  of  the  observations  from  the  tourists 
in  hotels  would  have  provided  an  estimate  comparable  with  the  arithmetic  mean 
of  the  observations  from  the  tourists  at  the  frontier  stations. 

It  is  interesting  to  note  that  the  estimate  from  hotel  residents  is  nearly 


twice  the  other,  a  factor  which  occurs  in  the  waiting  time  paradox'  (see 
Feller,  1966;  Fatil  and  Rao,  1977)  associated  with  the  exponential 
distribution.  This  suggests,  but  does  not  confirm,  that  the  sojourn  time 
distribution  may  be  exponential. 

Suppose  that  the  tourists  at  hotels  were  asked  haw  long  they  had  been 
staying  in  the  country  up  to  the  time  of  enquiry.  In  such  a  case,  under  the 
assumption  that  the  of  the  Y,  the  time  a  tourist  has  been  in  a  country 
up  to  the  time  of  enquiry,  is  the  same  as  that  of  the  product  XWR  where  Xw  is 
the  size  biased  version  of  X,  the  sojourn  time,  and  R  is  an  Independent  X2L 
with  a  uniform  distribution  on  [0,1].  If  F(x)  is  the  distribution  function  of 
X,  then  the  pdf  of  Y  is 

y^tl  -  F(y)].  (7.2) 

The  parameter  y  can  be  estimated  on  the  basis  of  observations  on  Y,  provided 
the  functional  form  of  F(y),  the  distribution  function  of  the  sojourn  time,  is 
known. 

It  is  interesting  to  note  that  the  pdf  (7.2)  is  the  same  as  that  obtained  by 
Cox  (1962)  in  studying  the  distribution  of  failure-time  of  a  component  used  in 
different  machines  from  observations  on  the  ages  of  the  components  in  use  at 
the  time  of  investigation. 

8.  DAMAGE  MODELS 

Let  N  be  a  ji  with  probability  distribution,  pn,  n  =  1,  2,  ...  ,  and  R  be  a 


rv  such  that 


P(R  =  r[  N  =  n)  =  s(r,n). 


(8.1) 


Then  the  marginal  distribution  of  R  truncated  at  zero  is 


Pi  =  (1  -  P)"1  1  pns(r,n),  r  =  1,  2,  ...  (8.2) 

n=r  11 

where 

00 

p  =  I  p1s(o,i).  (8.3) 

The  observation  r  represents  the  number  surviving  when  the  original 
observation  n  is  subject  to  a  destructive  process  which  reduces  n  to  r  with 
probability  s(r,n).  Such  a  situation  arises  when  we  consider  observations  on 
family  size  counting  only  the  surviving  children  (R).  The  problem  is  to 
determine  the  distribution  of  N,  the  original  family  size,  knowing  the 
distribution  of  R  and  assisting  a  suitable  survival  distribution. 

Suppose  that  N  -  P(X),  i.e.,  distributed  as  Poisson  with  parameter  X  and  let 
R  -  B(',tt),  i.e.,  binomial  with  parameter  it.  Then 

,  ( X  7r)  **  , 

Pr  =  ®” ^ ^  — rT~  +  (1  “  •"  )»  r  ■  1,  2,  ...  .  (8.4) 

It  is  seen  that  the  parameters  X  and  it  get  confounded  so  that  knowing  the 
distribution  of  R,  we  cannot  find  the  distribution  of  N.  Similar  confounding 
occurs  when  N  follows  a  binomial,  negative  binomial  or  logarithm  series 
distribution.  When  the  survival  distribution  is  binomial,  Sprott  (1968)  gives 
a  general  class  of  distributions  which  has  this  property.  What  additional 


*"-V- 
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information  is  needed  to  recover  the  original  distribution?  For  instance,  if 
we  know  which  of  the  observations  in  the  sample  did  not  suffer  damage,  then  it 
is  possible  to  estimate  the  original  distribution  as  well  as  the  binomial 
parameter  it. 


It  is  interesting  to  note  that  observations  which  do  not  suffer  any  damage 
have  the  distribution 


Pr  =  cpr*r 


(8.5) 


which  is  a  weighted  distribution.  If  the  original  distribution  is  Poisson, 


(Xir)r 


p£  a  e“XTr — prp+  (1  -  e-Xir) 


(8.6) 


which  is  same  as  (8.4).  It  is  shown  in  Rao  and  Rubin  (1964)  that  the  equality 
p£  a  pj,  characterizes  the  Poisson  distribution. 


The  damage  models  of  the  type  described  above  were  Introduced  in  Rao  (1965). 
For  theoretical  developments  on  damage  models  and  characterization  of 
probability  distributions  arising  out  of  their  study,  the  reader  is  referred 
to  Alzaid,  Rao  and  Shanbhag  (1984). 


9.  NONRESPONSE:  THE  STORY  OF  AN  EXTINCT  RIVER 


Sample  survey  practitioners  define  nonresponse  as  a  missing  observation  or 
nonavailability  of  measurements  on  a  unit  included  in  a  sample.  It  is  clear 


that  If  the  missing  values  can  be  considered  as  a  random  sample  from  the 
population  under  survey  then  the  observed  values  constitute  a  representative 
sample  of  the  whole  population  (Rubin,  1976).  Usually  this  is  not  the  case 
and  special  techniques  are  developed  in  sample  surveys  to  cope  with  such 
situations. 

In  general,  nonresponse  poses  serious  Issues  such  as  the  problem  of  broken 
skulls  not  providing  direct  measurements  on  capacity  (see  Section  4  of  this 
paper).  More  complex  cases  are  as  follows. 

For  instance,  we  may  try  to  estimate  the  underground  resources  in  a  given 
region  by  making  borings  at  a  randomly  chosen  set  of  points  and  taking  some 
measurements.  But  it  may  so  happen  that  borings  cannot  be  made  at  some  chosen 
points  due  to  some  reasons  such  as  the  presence  of  rocks.  The  measurements  at 
such  points  may  be  of  a  different  type  from  the  rest  in  which  case  the 
observed  sample  will  not  be  a  representative  sample  from  the  whole  region. 

Such  a  problem  arose  in  an  investigation  by  geologists  at  the  Indian 
Statistical  Institute  to  estimate  the  mean  direction  of  flow  of  an  extinct 
river  of  geological  times  in  a  certain  region  (see  Sengupta,  1966; 

J.  S.  Rao  and  Sengupta,  1966).  The  geologists  collected  a  series  of 
observations  on  direction  cosines  of  flow  (two  dimensional  vector  data),  which 
seemed  ideal  for  an  application  of  Fisher's  (1953)  distribution  and  the 
associated  theory  for  estimation  of  the  mean  direction  of  flow.  Then  the 

question  arose  as  to  what  the  hypothetical  population  was  from  which  the 

observations  could  be  considered  as  a  random  sample.  It  appeared  that  the 

measurements  on  direction  cosines  could  not  be  made  at  any  chosen  point,  but 

only  at  certain  points  where  there  was  rock  formation  with  some  markings  known 


as  "outcrops. "  The  geologists  walked  along  the  region  under  exploration  and 
made  measurements  wherever  they  came  across  outcrops.  If  the  outcrops  had 
been  uniformly  distributed  over  space,  then  it  might  have  been  possible  to 
define  a  population  of  which  the  observations  made  by  the  geologists  could  be 
a  representative  sample.  The  locations  at  which  observations  were  made  when 
plotted  on  a  topographical  map  of  the  region  showed  an  unequal  distribution  of 
outcrops  in  different  areas  of  the  region  indicating  the  nonrandom  nature  of 
the  occurence  of  outcrops.  In  such  a  case  the  estimate  of  mean  direction 
assuming  that  each  observation  is  an  independent  sample  with  a  common 
expectation  will  be  biased.  In  order  to  minimize  the  bias  in  estimation,  the 
following  method  of  estimation  was  adopted.  A  square  lattice  was  imposed  on 
the  topographical  map  and  the  measurements  in  each  grid  were  replaced  by  their 
average.  Then  a  simple  average  of  these  averages  was  taken  as  an  estimate  of 
the  mean  direction  of  flow.  This  estimate  differed  somewhat  from  the  average 
of  all  the  measurements  and  was  considered  to  have  less  bias. 

This  study  points  out  the  need  for  a  re-examination  of  data  on  directions  of 
rock  magnetism  collected  by  geologists  and  analysed  by  Fisher  (1953)  who 
developed  a  special  theory  for  that  purpose.  If  the  outcrops  at  which 
measurements  of  direction  are  possible  are  not  uniformly  distributed  over 
space,  then  there  will  be  some  difficulty  in  interpreting  the  observed  mean 
direction  as  an  estimate  of  some  specific  parameter. 

10.  COSCLUSIOHS 

Some  of  the  broad  conclusions  that  emerge  from  the  discussion  of  the  live 
examples  in  the  paper  are  as  follows: 


Specification  or  the  choice  of  a  model  is  of  great  value  in  data  analysis. 
An  appropriate  specification  for  given  data  can  be  arrived  at  on  the  basis  of 
past  experience,  information  on  the  stochastic  nature  of  events,  a  detailed 
knowledge  of  how  observations  are  ascertained  and  recorded,  and  an  exploratory 
analysis  of  current  data  Itself  using  graphical  displays,  preliminary  tests 
and  cross  validation  studies. 

Inaccuracies  in  specification  can  lead  to  wrong  inferenoe.  It  is  therefore 
worthwhile  to  review  the  data  under  different  possible  specifications  (models) 
to  determine  how  variant  the  conclusions  could  be. 

What  population  does  an  observed  sample  represent?  What  is  the  widest 
possible  universe  to  which  the  conclusions  drawn  from  a  sample  apply?  The 
answers  depend  on  how  the  observations  are  ascertained  and  what  the 
deficiencies  in  data  are  in  terms  of  nonresponse,  measurement  errors,  and 
contamination. 

Every  data  set  has  its  own  unique  features  which  may  be  revealed  in  Initial 
scrutiny  of  data  and/or  during  statistical  analysis,  which  may  have  to  be 
taken  into  account  in  interpreting  data.  Routine  data  analysis  based  on  text 
book  methods  or  software  packages  may  be  misleading. 

Generally  in  scientific  investigations,  a  specific  question  cannot  be 
answered  without  knowing  the  answers  to  several  other  questions.  It  often 
pays  to  analyse  the  data  to  throw  light  on  a  broader  set  of  relevant  and 
related  questions. 


What  data  should  be  collected  to  answer  specific  questions?  Lack  of 
information  on  certain  aspects  may  create  undue  complications  in  applying 
statistical  methods  and/or  restrict  the  nature  of  conclusions  drawn  from 
available  data.  Attempts  should  be  made  to  collect  information  on  ooncomitant 
variables  to  the  extent  possible,  whose  use  can  enhance  the  precision  of 
estimators  of  unknown  parameters,  and  provide  broader  validity  to  statistical 
inference. 
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